Back

DNA Research

Oxford University Press (OUP)

All preprints, ranked by how well they match DNA Research's content profile, based on 23 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
Improvements to the Gulf Pipefish Syngnathus scovelli Genome

Ramesh, B.; Small, C.; Healey, H.; Johnson, B.; Barker, E.; Currey, M.; Bassham, S.; Myers, M.; Cresko, W.; Jones, A.

2023-01-24 genomics 10.1101/2023.01.23.525209 medRxiv
Top 0.1%
25.8%
Show abstract

The Gulf pipefish Syngnathus scovelli has emerged as an important species in the study of sexual selection, development, and physiology, among other topics. The fish family Syngnathidae, which includes pipefishes, seahorses, and seadragons, has become an increasingly attractive target for comparative research in ecological and evolutionary genomics. These endeavors depend on having a high-quality genome assembly and annotation. However, the first version of the S. scovelli genome assembly was generated by short-read sequencing and annotated using a small set of RNA-sequence data, resulting in limited contiguity and a relatively poor annotation. Here, we present an improved genome assembly and an enhanced annotation, resulting in a new official gene set for S. scovelli. By using PacBio long-read high-fidelity (Hi-Fi) sequences and a proximity ligation (Hi-C) library, we fill small gaps and join the contigs to obtain 22 chromosome-level scaffolds. Compared to the previously published genome, the gaps in our novel genome assembly are smaller, the N75 is much larger (13.3 Mb), and this new genome is around 95% BUSCO complete. The precision of the gene models in the NCBIs eukaryotic annotation pipeline was enhanced by using a large body of RNA-Seq reads from different tissue types, leading to the discovery of 28,162 genes, of which 8,061 were non-coding genes. This new genome assembly and the annotation are tagged as a RefSeq genome by NCBI and thus provide substantially enhanced genomic resources for future research involving S. scovelli.

2
Telomere-to-Telomere and Haplotype-Phased Genome Assemblies of the Heterozygous Octoploid 'Florida Brilliance' Strawberry (Fragaria x ananassa)

HAN, H.; Barbey, C. R.; Fan, Z.; Verma, S.; Whitaker, V.; Lee, S.

2022-10-07 genomics 10.1101/2022.10.05.509768 medRxiv
Top 0.1%
22.3%
Show abstract

The available haplotype-resolved allo-octoploid strawberry (Fragaria x ananassa Duch.) (2n = 8x = 56) genomes were assembled with the trio-binning pipeline, supplied with parental short-reads. We report here a high-quality, haplotype-phased genome assembly of a short-day cultivar, Florida Brilliance (FaFB2) without the use of parental sequences. Using Pacific Biosciences (PacBio) long reads and high-throughput chromatic capture (Hi-C) data, we completed telomere-to-telomere phased genome assemblies of both haplotypes. The N50 continuity of the two haploid assemblies were 23.7 Mb and 26.6 Mb before scaffolding and gap-filling. All 56 pseudochromosomes from the phased-1 and phased-2 assemblies contained putative telomere sequences at the 5 and/or 3 ends. A high level of collinearity between the haplotypes was confirmed by high-density genetic linkage mapping with 10,269 SNPs, and a high level of collinearity with the Royal Royce FaRR1 reference genome was observed. Genome completeness was further confirmed by consensus quality. The LTR assembly Index score for entire genome assembly was 19.72. Moreover, the BUSCO analysis detected over 99% of conserved genes in the combined phased-1 and phased-2 assembly. Both haploid assemblies were annotated using Iso-Seq data from six different Florida Brilliance tissues and RNA-Seq data representing various F. x ananassa tissues from the NCBI sequence read archive, resulting in a total of 104,099 genes. This telomere-to-telomere reference genome of Florida Brilliance will advance our knowledge of strawberry genome evolution and gene functions, and facilitate the development of new breeding tools and approaches.

3
Decoding the Centromeric Region with a Near Complete Genome Assembly of the Oshima Cherry Cerasus speciosa

Fujiwara, K.; Toyoda, A.; Biswa, B. B.; Kishida, T.; Tsuruta, M.; Nakamura, Y.; Kimura, N.; Kawamoto, S.; Sato, Y.; Katsuki, T.; Sakura 100 Genome Consortium, ; Koide, T.

2024-06-22 genomics 10.1101/2024.06.17.599445 medRxiv
Top 0.1%
22.2%
Show abstract

The Oshima cherry (Cerasus speciosa), which is endemic to Japan, has significant cultural and horticultural value. In this study, we present a near complete telomere-to-telomere genome assembly for C. speciosa, derived from the old growth "Sakurakkabu" tree on Izu Oshima Island. Using Illumina short-read, PacBio long-read, and Hi-C sequencing, we constructed a 269.3 Mbp genome assembly with a contig N50 of 32.0 Mbp. We examined the distribution of repetitive sequences in the assembled genome and identified regions that appeared to be centromeric. Detailed structural analysis of these putative centromeric regions revealed that the centromeric regions of C. speciosa comprised repetitive sequences with monomer lengths of 166 or 167 bp. Comparative genomic analysis with Prunus sensu lato genome revealed structural variations and conserved syntenic regions. This high-quality reference genome provides a crucial tool for studying the genetic diversity and evolutionary history of Cerasus species, facilitating advancements in horticultural research and the preservation of this iconic species.

4
Chromosome-scale de novo diploid assembly of the apple cultivar 'Gala Galaxy'

Broggini, G. A. L.; Schlathölter, I.; Russo, G.; Copetti, D.; Yates, S. A.; Studer, B.; Patocchi, A.

2020-04-25 genomics 10.1101/2020.04.25.058891 medRxiv
Top 0.1%
19.1%
Show abstract

Apple (Malus x domestica) is one of the most important fruit crops in terms of worldwide production. Due to its self-incompatibility system and the long juvenile period, breeding of new apple cultivars combining traits desired by growers (e.g. yield, pest and disease resistance) and consumers (e.g. fruit size, color, and flavor) is a long and complex process. Genomics-assisted breeding strategies can facilitate the selection of germplasm leading to new cultivars. While the most complete apple genome assemblies available to date are from anther-derived homozygous lines, de novo assembly of apple genomes encompassing the natural heterozygosity remains challenging. Using long- and short-read sequencing technologies in combination with optical mapping, we de novo assembled a diploid and heterozygous genome of the apple cultivar Gala Galaxy. This approach resulted in 154 hybrid scaffolds (N50 = 34.3 Mb) spanning 999.9 Mb and in 414.7 Mb of unscaffolded sequences. Anchoring 31 scaffolds with a genetic map was sufficient to represent an entire haploid genome of 17 pseudomolecules (719.4 Mb). The remaining sequences were assembled in a second set of 17 pseudomolecules, which spanned 601 Mb, leaving 80.6 Mb of unplaced sequences. A total of 41,264 genes were annotated using 74,900 transcripts derived from RNA sequencing of pooled leaf tissue samples. This study provides a high-quality diploid reference genome sequence encompassing the natural heterozygosity of the widely popular cultivar Gala Galaxy. The DNA sequence resources and the assembly described here will serve as a solid foundation for fundamental and applied apple breeding research.

5
The draft genome sequence of Eucalyptus polybractea based on hybrid assembly with short- and long-reads reads

Li, T.; Kainer, D.; Foley, W. J.; Rodrigo, A.; Kuelheim, C.

2021-05-18 genomics 10.1101/2021.05.18.444652 medRxiv
Top 0.1%
19.1%
Show abstract

Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83x and 15x genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.

6
Chromosome-scale genome assembly of Eustoma grandiflorum, the first complete genome sequence in family Gentianaceae

Shirasawa, K.; Arimoto, R.; Hirakawa, H.; Ishimorai, M.; Ghelfi, A.; Miyasaka, M.; Endo, M.; Kawabata, S.; Isobe, S.

2021-09-11 genomics 10.1101/2021.09.09.459690 medRxiv
Top 0.1%
18.6%
Show abstract

Eustoma grandiflorum (Raf.) Shinn., is an annual herbaceous plant native to the southern United States, Mexico, and the Greater Antilles. It has a large flower with a variety of colors and an important flower crop. In this study, we established a chromosome-scale de novo assembly of E. grandiflorum by integrating four genomic and genetic approaches: (1) Pacific Biosciences (PacBio) Sequel deep sequencing, (2) error correction of the assembly by Illumina short reads, (3) scaffolding by chromatin conformation capture sequencing (Hi-C), and (4) genetic linkage maps derived from an F2 mapping population. The 36 pseudomolecules and unplaced 64 scaffolds were created with total length of 1,324.8 Mb. Full-length transcript sequencing was obtained by PacBio Iso-Seq sequencing for gene prediction on the assembled genome, Egra_v1. A total of 36,619 genes were predicted on the genome as high confidence HC) genes. Of the 36,619, 25,936 were annotated functions by ZenAnnotation. Genetic diversity analysis was also performed for nine commercial E. grandiflorum varieties bred in Japan, and 254,205 variants were identified. This is the first report of the construction of reference genome sequences in E. grandiflorum as well as in the family Gentianaceae.

7
Green Elegance T2T: An upgraded telomere-to-telomere genome assembly and annotation of looseleaf lettuce (Lactuca sativa var. crispa)

Zhang, B.; Liu, X.; Ding, H.; Yang, Y.; Tang, J.; Li, D.

2024-11-21 genomics 10.1101/2024.11.20.624396 medRxiv
Top 0.1%
18.6%
Show abstract

Lettuce (Lactuca sativa L.), a globally important vegetable crop, is cultivated in various horticultural varieties. Looseleaf lettuce is one of the main varieties that are consumed. Here, we report a telomere-to-telomere (T2T) genome assembly of looseleaf lettuce (L. sativa var. crispa cv. Green Elegance). By combining the sequencing data of previously reported Green Elegance v1.2, the 42.20 Gb clean ultra-long reads produced by Oxford Nanopore Technology were used to close the gaps and upgrade the genome assembly. After filling 11 gaps in Green Elegance v1.2, the final gapless telomere-to-telomere genome assembly is 2.58 Gb in length with a contig N50 of 282.47 Mb, containing nine centromeres and 18 telomeres. The genome contained 41,375 protein-coding genes, of which 99.10% were functionally annotated. This T2T genome of looseleaf lettuce will be valuable for the identification of genetic variation and the advancement of lettuce breeding.

8
The Haplotype-Resolved And Chromosome-Scale Genome Of Vaccinium Stamineum: A New Source Of Genetic Variability For Blueberry Breeding

Matsumoto, G. O.; Benevenuto, J.; Munoz, P. R.

2025-03-25 genomics 10.1101/2025.03.21.644558 medRxiv
Top 0.1%
18.6%
Show abstract

The Vaccinium genus comprises several commercially important fruit crops, such as blueberry, lingonberry, bilberry, and cranberry. However, past breeding efforts have primarily focused on a limited number of wild relatives as sources of genetic variability, leaving a vast genetic reservoir untapped. In this study, we present the first haplotype-phased reference genome for V. stamineum, a blueberry wild relative with potential for de novo domestication and introgression into breeding programs. V. stamineum is particularly notable for several agronomic traits of interest, such as high soluble sugars, unique flavor profile, and its unique anthocyanin accumulation in fruit pulp, a trait absent in cultivated blueberries.Our assemblies revealed 12 pseudomolecules corresponding to the base chromosome number of Vaccinium species, with a genome size of 529.16 Mb and 493.82 Mb for primary and secondary haplotypes, respectively. Despite a slightly smaller genome than other Vaccinium species, V. stamineum exhibited a higher number of predicted protein-coding genes, while the repetitive elements comprised 39.77% and 42.38% of the primary and secondary haplotypes, respectively. BUSCO analysis indicated 97% transcriptome completeness, supporting the accuracy of gene annotation. Genome-wide alignments showed that V. stamineum haplotypes were highly collinear to each other, as well as to V. corymbosum. However, further validation is required to resolve a putative translocation in chromosome 1 of the primary haplotype. Altogether, this study established a genomic framework that will facilitate the introgression of traits of interest into blueberry breeding programs and support the potential domestication of V. stamineum as a novel fruit crop.

9
Chromosome-level genome construction of a Japanese stickleback species using ultra-dense linkage analysis using single-cell sequencing of sperms

Yoshitake, K.; Ishikawa, A.; Yonezawa, R.; Kinoshita, S.; Kitano, J.; Asakawa, S.

2020-05-14 genomics 10.1101/2020.05.12.092221 medRxiv
Top 0.1%
18.4%
Show abstract

The presence of high quality genomes at the chromosome level is very useful in the search for the causal genes of mutants and in genetic breeding. The advent of next-generation sequencers has made it easier to decode genomes, but it is still difficult to construct the genomes of higher organisms. In order to construct the genome of a higher organism, the genome sequence of the organism is extended to the length of the chromosome by linkage analysis after assembly and scaffolding. However, in the past linkage analysis, it was difficult to make a high-density linkage map, and it was not possible to analyze organisms without an established breeding system. As an innovative alternative to conventional linkage analysis, we devised a method for genotyping sperm using 10x single-cell genome (CNV) sequencing libraries to generate a linkage map without interbreeding individuals. The genome was constructed using sperm from Gasterosteus nipponicus, and single-cell genotyping yielded 1,864,430 very dense hetero-SNPs. The average coverage per sperm cell is 0.13x. The number of sperm used is 1,738, which is an order of magnitude higher than the number of sperm used for conventional linkage analysis. We have improved the linkage analysis tool SELDLA (Scaffold Extender with Low Depth Linkage Analysis) so that we can analyze the data in accordance with the characteristics of single-cell genotyping data. Finally, we were able to determine the location and orientation on the chromosome for 85.6% of the contigs in the 456 Mbase genome of Gasterosteus nipponicus sequenced in nanopores. A total of 95.6% of the contigs in which a cross-reaction was detected within the contigs.

10
Chromosome-scale assembly and annotation of the wildwheat relative Aegilops comosa

Li, H.; Rehman, S. u.; Song, R.; Qiao, L.; Hao, X.; Zhang, J.; Li, K.; Hou, L.; Hu, W.; Wang, L.; Chen, S.

2024-10-17 genomics 10.1101/2024.10.15.618371 medRxiv
Top 0.1%
18.3%
Show abstract

Wild relatives of wheat are valuable sources for enhancing the genetic diversity of common wheat. Aegilops comosa, an annual diploid species with an MM genome constitution, possesses numerous agronomically valuable traits that can be exploited for wheat improvement. In this study, we report a chromosome-level genome assembly of Ae. comosa accession PI 551049, generated using PacBio high-fidelity (HiFi) reads and high-throughput chromosome conformation capture (Hi-C) data. The assembly spans 4.47 Gb, featuring a contig N50 of 23.59 Mb and a scaffold N50 of 619.05 Mb. A total of 39,063 gene models were annotated through a combination of homoeologous proteins, Iso-Seq, and RNA-Seq data. Comparative genome analysis revealed a terminal intrachromosomal translocation in chromosome 2M of Ae. comosa (and Ae. umbellulata) compared to its homoeologous chromosomes in other diploid wheat species. Phylogenetic analysis showed a close relationship between Ae. comosa and Ae. umbellulata. This newly constructed reference genome of Ae. comosa will serve as an important genomic resource for comparative genomic studies and the cloning of agriculturally important genes.

11
A draft genome of grass pea (Lathyrus sativus), a resilient diploid legume

Emmrich, P. M. F.; Sarkar, A.; Njaci, I.; Kaithakottil, G. G.; Ellis, N.; Moore, C.; Edwards, A.; Heavens, D.; Waite, D.; Cheema, J.; Trick, M.; Moore, J.; Webb, A.; Calazzo, R.; Thomas, J.; Higgins, J.; Swarbreck, D.; Kumar, S.; Mundree, S.; Loose, M. W.; Yant, L.; Martin, C.; Wang, T. L.

2020-04-27 genomics 10.1101/2020.04.24.058164 medRxiv
Top 0.1%
18.0%
Show abstract

We have sequenced the genome of grass pea (Lathyrus sativus), a resilient diploid (2n=14) legume closely related to pea (Pisum sativum). We determined the genome size of the sequenced European accession (LS007) as 6.3 Gbp. We generated two assemblies of this genome, i) EIv1 using Illumina PCR-free paired-end sequencing and assembly followed by long-mate-pair scaffolding and ii) Rbp using Oxford Nanopore Technologies long-read sequencing and assembly followed by polishing with Illumina paired-end data. EIv1 has a total length of 8.12 Gbp (including 1.9 billion Ns) and scaffold N50 59,7 kbp. Annotation has identified 33,819 high confidence genes in the assembly. Rbp has a total length of 6.2 Gbp (with no Ns) and a contig N50 of 155.7 kbp. Gene space assessment using the eukaryote BUSCO database showed completeness scores of 82.8 % and 89.8%, respectively.

12
Improved genome assembly of double haploid Prunus persica siblings Lovell 2D and Lovell 5D and the peach NLRome

Gottschalk, C.; Brock, j. R.; Mansfeld, B. N.; Main, D.; Jung, S.; Zheng, P.; Vann, C.; Demuth, M.; Bennett, D.; Liu, Z.; Dardick, C.

2025-06-06 genomics 10.1101/2025.06.03.657495 medRxiv
Top 0.1%
17.9%
Show abstract

Prunus persica (peach) has long served as a model fruit tree for studying phenological events. It has a relatively small genome and exhibits tremendous plasticity in climate tolerances due to the high variation of chill requirements, bloom times, and fruit ripening times. The peach variety Lovell 2D was used to generate one of the first high-quality genome assemblies for a tree species, using Sanger sequencing of genetic-map ordered BAC clones. A key to the high quality of this early assembly was the use of a doubled haploid variety, which eliminates the challenges posed by mixed haplotypes. Here, we re-sequenced and assembled the Lovell 2D genome along with a doubled haploid sibling Lovell 5D using 3rd generation technologies. The resulting genomes were significantly more contiguous than the current Lovell 2D reference genome (ver2.0 updated in 2017) and are closer to the estimated total genome size for peach (265Mb). In addition, new gene, transposable element (TE), and Nucleotide-binding domain and Leucine-rich repeat receptor (NLR) annotations were performed to enhance the integrity and utility of the genome. These updated peach doubled-haploid reference assemblies will provide the research community with an improved reference genome for genomics-guided studies and breeding efforts.

13
A new genome assembly of the pea cultivar Cameor provides resources for functional genomics and genetics

Kreplak, J.; Novak, P.; Robledillo, L. A.; AUBERT, G.; Imbert, B.; Kaur, P.; Gouil, Q.; Lopez-Roques, C.; Rodde, N.; BOUCHEZ, O.; Tayeh, N.; Macas, J.; Burstin, J.

2025-04-02 genomics 10.1101/2025.04.01.645976 medRxiv
Top 0.1%
17.3%
Show abstract

Significant improvements in sequencing technologies have allowed the development of more contiguous genome assemblies in many plant species. The pea genome is characterized by its richness in repeated elements and its long and complex centromeres. This makes its assembly challenging. In this paper, we present an improved version of the genome sequence of the French cultivar Cameor. This sequence was obtained by combining Nanopore and PacBio long-read sequencing, Hi-C contact maps and Bionano maps. The assembly of centromeres was refined using a combination of FISH and ultra-long Nanopore read analyses. Overall, Cameor_v2 genome assembly is a highly continuous pea genome assembly with small total gap size and a large contig N50. In this version, the orientation of chromosomes was revised according to internationally accepted karyotype rules. Gene annotation statistics indicated a high completeness of gene sequences, with most gene sequences with 3 and 5 UTR. This genome assembly with its associated data constitute a useful resource for pea genetics, comparative mapping and functional genomics.

14
Haploid-resolved and chromosome-scale genome assembly in hexa-autoploid sweetpotato (Ipomoea batatas (L.) Lam)

Yoon, U.-H.; Cao, Q.; Shirasawa, K.; Zhai, H.; Lee, T.-H.; Tanaka, M.; Hirakawa, H.; Hahn, J.-H.; Wang, X.; Kim, H. S.; Tabuchi, H.; Zhang, A.; Kim, T.-H.; Nagasaki, H.; Xiao, S.; Okada, Y.; Jeong, J. C.; Nagano, S.; Shin, Y.; Lee, H.-U.; Park, S.-U.; Lee, S. J.; Lee, K.; Yang, J.-W.; Ahn, B. O.; Ma, D.; Takahata, Y.; Kwak, S.-S.; Liu, Q.; Isobe, S.

2022-12-25 genomics 10.1101/2022.12.25.521700 medRxiv
Top 0.1%
17.1%
Show abstract

Sweetpotato (Ipomoea batatas (L.) Lam) is the worlds seventh most important food crop by production quantity. Cultivated sweetpotato is a hexaploid (2n = 6x = 90), and its genome (B1B1B2B2B2B2) is quite complex due to polyploidy, self-incompatibility, and high heterozygosity. Here we established a haploid-resolved and chromosome-scale de novo assembly of autohexaploid sweetpotato genome sequences. Before constructing the genome, we created chromosome-scale genome sequences in I. trifida using a highly homozygous accession, Mx23Hm, with PacBio RSII and Hi-C reads. Haploid-resolved genome assembly was performed for a sweetpotato cultivar, Xushu18 by hybrid assembly with Illumina paired-end (PE) and mate-pair (MP) reads, 10X genomics reads, and PacBio RSII reads. Then, 90 chromosome-scale pseudomolecules were generated by aligning the scaffolds onto a sweetpotato linkage map. De novo assemblies were also performed for chloroplast and mitochondrial genomes in I. trifida and sweetpotato. In total, 34,386 and 175,633 genes were identified on the assembled nucleic genomes of I. trifida and sweetpotato, respectively. Functional gene annotation and RNA-Seq analysis revealed locations of starch, anthocyanin, and carotenoid pathway genes on the sweetpotato genome. This is the first report of chromosome-scale de novo assembly of the sweetpotato genome. The results are expected to contribute to genomic and genetic analyses of sweetpotato.

15
Chromosomal-level genome assembly of Populus adenopoda

Liu, S.; Wang, Z.; Shi, T.; Dan, X.; Zhang, Y.; Liu, J.; Wang, J.

2023-07-12 genomics 10.1101/2023.07.11.548479 medRxiv
Top 0.1%
16.8%
Show abstract

High-quality reference genomes for several species have promoted breeding and functional studies of poplar trees. By resequencing numerous accessions of these and closely related species, single nucleotide polymorphisms (SNPs) and small insertion/deletions (InDels) have been identified to assist in clarifying local adaptation and phenotypic diversification. A chromosome-level genome assembly for P. adenopoda was assembled based on Illumina and PacBio sequencing platforms, facilitated by Hi-C technology. The assembled genome size was about 383 Mb, with 99.70% of the contigs anchored to 19 pseudo-chromosomes, and a total of 33,505 protein-coding genes were annotated. This high-quality genome provided the genomic basis for the subsequent detection of various variants.

16
Chromosome scale reference genome of Cluster bean (Cyamopsis tetragonoloba (L.) Taub.)

Gaikwad, K.; Ramakrishna, G.; Srivastava, H.; Saxena, S.; Kaila, T.; Tyagi, A.; Sharma, P.; Sharma, S.; Sharma, R.; Mahla, H.; SV, A. M.; Solanke, A.; Kalia, P.; Rao, A.; Rai, A.; Sharma, T.; Singh, N.

2020-05-18 genomics 10.1101/2020.05.16.098434 medRxiv
Top 0.1%
14.7%
Show abstract

Clusterbean (Cyamopsis tetragonoloba (L.) Taub.), also known as Guar is a widely cultivated dryland legume of Western India and parts of Africa. Apart from being a vegetable crop, it is also an abundant source of a natural hetero-polysaccharide called guar gum or galactomannan which is widely used in cosmetics, pharmaceuticals, food processing, shale gas drilling etc. Here, for the first time we are reporting a chromosome-scale reference genome assembly of clusterbean, from a high galactomannan containing popular guar cultivar, RGC-936, by combining sequenced reads from Illumina, 10x Chromium and Oxford Nanopore technologies. The initial assembly of 1580 scaffolds with an N50 value of 7.12 Mbp was generated. Then, the final genome assembly was obtained by anchoring these scaffolds to a high density SNP map. Finally, a genome assembly of 550.31 Mbp was obtained in 7 pseudomolecules corresponding to 7 chromosomes with a very high N50 of 78.27 Mbp. We finally predicted 34,680 protein-coding genes in the guar genome. The high-quality chromosome-scale cluster bean genome assembly will facilitate understanding of the molecular basis of galactomannan biosynthesis and aid in genomics-assisted breeding of superior cultivars.

17
An improved reference of the grapevine genome supports reasserting the origin of the PN40024 highly-homozygous genotype

Velt, A.; Frommer, B.; Blanc, S.; Holtgräwe, D.; Duchene, E.; Dumas, V.; Grimplet, J.; Hugueney, P.; Lahaye, M.; Kim, C.; Matus, J. T.; Navarro-Paya, D.; Orduna, L.; Tello-Ruiz, M. K.; Vitulo, N.; Ware, D.; Rustenholz, C.

2022-12-22 genomics 10.1101/2022.12.21.521434 medRxiv
Top 0.1%
14.7%
Show abstract

The genome sequence assembly of the diploid and highly homozygous V. vinifera genotype PN40024 serves as the reference for many grapevine studies. Despite several improvements of the PN40024 genome assembly, its current version PN12X.v2 is quite fragmented and only represents the haploid state of the genome with mixed haplotypes. In fact, despite the PN40024 genome is nearly homozygous, it still contains various heterozygous regions. Taking the opportunity of the improvements that long-read sequencing technologies offer to fully discriminate haplotype sequences and considering that several Vitis sp. genomes have recently been assembled with these approaches, an improved version of the reference, called PN40024.v4, was generated. Through incorporating long genomic sequencing reads to the assembly, the continuity of the 12X.v2 scaffolds was highly increased. The number of scaffolds decreased from 2,059 to 640 and the number of N bases was reduced by 88%. Additionally, the full alternative haplotype sequence was built for the first time, the chromosome anchoring was improved and the amount of unplaced scaffolds were reduced by half. To obtain a high-quality gene annotation that outperforms previous versions, a liftover approach was complemented with an optimized annotation workflow for Vitis. Integration of the gene reference catalogue and its manual curation have also assisted in improving the annotation, while defining the most reliable estimation to date of 35,230 genes. Finally, we demonstrate that PN40024 resulted from selfings of cv. Helfensteiner (cross of cv. Pinot noir and Schiava grossa) instead of a single Pinot noir. These advances will help maintaining the PN40024 genome as a gold-standard reference also contributing in the eventual elaboration of the grapevine pangenome.

18
Inference of a genome-wide protein-coding gene set of the inshore hagfish Eptatretus burgeri

Yamaguchi, K.; Hara, Y.; Kaori, T.; Nishimura, O.; Smith, J.; Kadota, M.; Kuraku, S.

2020-07-26 genomics 10.1101/2020.07.24.218818 medRxiv
Top 0.1%
14.5%
Show abstract

The group of hagfishes (Myxiniformes) arose from agnathan (jawless vertebrate) lineages and is one of the only two extant cyclostome taxa, together with lampreys (Petromyzontiformes). Even though whole genome sequencing has been achieved for diverse vertebrate taxa, genome-wide sequence information has been highly limited for cyclostomes. Here we sequenced the genome of the inshore hagfish Eptatretus burgeri using DNA extracted from the testis, with a short-read sequencing platform, aiming at reconstructing a high-coverage coding gene catalogue. The obtained genome assembly, scaffolded with mate-pair reads and paired RNA-seq reads, exhibited an N50 scaffold length of 293 Kbp, which allowed the genome-wide prediction of coding genes. This computation resulted in the gene models whose completeness was estimated at the complete coverage of more than 83 % and the partial coverage of more than 93 % by referring to evolutionarily conserved single-copy orthologs. The high contiguity of the assembly and completeness of resulting gene models promises a high utility in various comparative analyses including phylogenomics and phylome exploration.

19
The native mussel Mytilus chilensis genome reveals adaptative molecular signatures facing the marine environment.

Gallardo-Escarate, C.; Valenzuela-Munoz, V.; Nunez-Acuna, G.; Valenzuela-Miranda, D.; Tapia, F.; Yevenes, M.; Gajardo, G.; Toro, J. E.; Oyarzun, P. A.; Arriagada, G.; novoa, b.; Figueras, A.; Roberts, S.; Gerdol, M.

2022-09-07 genomics 10.1101/2022.09.06.506863 medRxiv
Top 0.1%
14.5%
Show abstract

The blue mussel Mytilus chilensis is a key socioeconomic species inhabiting the southern coast of Chile. This endemic marine mussel supports a booming aquaculture industry, which entirely relies on artificially collected seeds from natural beds that are translocated to a diverse physical-chemical ocean conditions for farming. Furthermore, mussel production is threatened by a broad range of microorganisms, pollution, and environmental stressors that eventually impact its survival and growth. Herein, understanding the genomic basis of the local adaption is pivotal to developing sustainable shellfish aquaculture. We present a high-quality reference genome of M. chilensis, which is the first chromosome-level genome for a Mytilidae member in South America. The assembled genome size was 1.93 Gb, with a contig N50 of 134 Mb. Through Hi-C proximity ligation, 11,868 contigs were clustered, ordered, and assembled into 14 chromosomes in congruence with the karyological evidence. The M. chilensis genome comprises 34,530 genes and 4,795 non-coding RNAs. A total of 57% of the genome contains repetitive sequences with predominancy of LTR-retrotransposons and unknown elements. Comparative genome analysis was conducted among M. chilensis and M. coruscus, revealing genic rearrangements distributed into the whole genome. Notably, Steamer-like elements associated with horizontal transmissible cancer were explored in reference genomes, suggesting putative phylogenetic relationships at the chromosome level in Bivalvia. Genome expression analysis was also conducted, showing putative genomic differences between two ecologically different mussel populations. Collectively, the evidence suggests that local genome adaptation can be analyzed to develop sustainable mussel production. The genome of M. chilensis provides pivotal molecular knowledge for the Mytilus complex evolution and will help to understand how climate change can impact mussel biology.

20
A chromosome-level reference genome for Pacific herring (Clupea pallasii) from the Bering Sea

Timm, L. E.; Hsieh, Y.; Lopez, J. A.; Almgren, S. A.; Glass, J. R.

2026-02-10 genomics 10.64898/2026.02.09.704930 medRxiv
Top 0.1%
14.4%
Show abstract

Pacific herring (Clupea pallasii) serve as a critical trophic link between plankton and many marine species targeted by fisheries. With a broad distribution throughout the North Pacific Ocean, from the Arctic to temperate latitudes, herring hold ecological, economic, and cultural importance. Despite this importance, genomic resources for this species, such as reference genome sequences, have only recently become available. To date, only one scaffold-level reference genome, representing a specimen from the Gulf of Alaska (Vancouver; 1,379 scaffolds), has been published to NCBI. Addressing this data gap, we produced a high quality 795Mb genome sequence organized into 26 chromosomes combining long read sequencing with short read sequencing of proximity ligation libraries. Our assembly is highly complete (BUSCO score of 97.7%) and contiguous (922 contigs, N50 = 7,338,470, L50 = 38; 26 scaffolds, N50 = 31,494,017; L50 = 12). Pacific herring south of the Aleutian Islands and the Alaska Peninsula are genetically differentiated from those in the Bering Sea, making a reference genome from the eastern Bering Sea an important addition to the Pacific herrings genomic toolbox.